Finding Transport Proteins in a General Protein Database
نویسندگان
چکیده
The number of specialized databases in molecular biology is growing fast, as is the availability of molecular data. These trends necessitate the development of automatic methods for finding relevant information to include in specialized databases. We show how to use a comprehensive database (SwissProt) as a source of new entries for a specialized database (TCDB, the Transport Classification Database). Even carefully constructed keyword-based queries perform poorly in determining which SwissProt records are relevant to TCDB; we show that a machine learning approach performs well. We describe a maximum-entropy classifier, trained on SwissProt records, that achieves high precision and recall in cross-validation experiments. This classifier has been deployed as part of a pipeline for updating TCDB that allows a human expert to examine only about 2% of SwissProt records for potential inclusion in TCDB. The methods we describe are flexible and general, so they can be applied easily to other specialized databases.
منابع مشابه
The identification of protein changes in Celeribacter persicus SBU1 after degrading phenanthrene
Organisms in different environmental conditions express different genes, which result in different protein expressions. These changes result from the adaptation of the organism to environmental conditions such as the presence of toxic substances. This study aimed to investigate the changes in protein expression in Celeribacter persicus SBU1 isolated from Nayband Bay mangrove forests, cultured i...
متن کاملComparative proteomics analysis of a novel g-radiation-resistant bacterium wild-type Bacillus megaterium strain WHO DQ973298 recovering from 5 KGy g-irradiation
In order to examine radiation-induced proteins in an extremely radio-resistant bacterium, it became possibleto perform comparative proteomic analysis on radio-resistance Bacillus megaterium WHO as a wildtypestrain for the first time. Variation in cellular proteins profiles of the Bacillus megaterium WHO after 5KGy γ-irradiation were analyzed by two-dimensional poly acryl amide...
متن کاملDetermining Difference in Evolutionary Variation of Bacterial RecA proteins vs 16SrRNA Genes by using 16s_Toxonomy Tree
Background and Aims: The rate of variation in various genes of a bacterial species is different during evolution. Therefore, in systematic bacterial studies many researchers compare the phylogenetic tree of a particular gene to the standard tree of an rRNA gene. Regarding the importance of 16SrRNA gene and the evolutional process of RecA protein family, we investigated the changes in the select...
متن کاملA Pipeline to Automate the Updating of a Specialized Protein Database
Motivation: The growing number of specialized databases in molecular biology, coupled with the huge increase in the availability of molecular data, necessitates the development of automatic methods for finding and adding relevant information to these databases. Results: We show how a general protein database (Swiss-Prot) can be used as a source of data for a more specialized one (TCDB, the Tran...
متن کاملThe Effect of Low Volume High Intensity Interval Training on Sarcolemmal Content of Fatty Acid Transport Proteins (FAT/CD36 and FABPpm) in Young Men
High-intensity interval training (HIT) induces skeletal muscle metabolic and performance adaptations that resemble traditional endurance training despite a low total exercise volume. On the other hand, fatty acid oxidation is increases in skeletal muscle with endurance training. This process is regulated in several sites, including the transport of fatty acids across the plasma membrane. The...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007